pacman::p_load(sf, sfdep, tmap, tidyverse, RColorBrewer, ggplot2, spatstat, jsonlite, units, matrixStats, httr)Take Home Exercise 3
1. Overview
1.1 Introduction
1.2 My Responsibilities
- Data Preparation, Preprocessing
1.3 Importing Packages
Here, we have loaded the following packages:
2. The Data
For this project, we will be using the following data sets.
Singapore Rental Flat Prices (Jan-17 to Sep-24) from data.gov.sg
Master Plan 2014 Subzone Boundary (Web) from data.gov.sg
Hawker Centres Dataset from data.gov.sg
Kindergarten, Childcare Datasets from OneMap API
Bus Stops Location, MRT/ LRT Locations from LTA Data Mall
Shopping Mall Coordinates through wikipedia and webscraping with the coordinates retrieved through OneMap API
2.1 Importing Geospatial Data
2.1.1 Importing Singapore Subzone Boundaries
The code chunk below is used to import MP_SUBZONE_WEB_PL shapefile by using st_read() of sfpackages.
mpsz_sf <- st_read(dsn = "data/geospatial/MasterPlan2014SubzoneBoundaryWebSHP/", layer = "MP14_SUBZONE_WEB_PL")Reading layer `MP14_SUBZONE_WEB_PL' from data source
`/Users/georgiaxng/georgiaxng/is415-handson/Take-home_Ex/Take-home_Ex03/data/geospatial/MasterPlan2014SubzoneBoundaryWebSHP'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
write_rds(mpsz_sf, 'data/rds/mpsz_sf.rds')st_crs(mpsz_sf)Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
2.1.1.1 Checking Validity of Geometries
Using st_is_valid, we can check to see whether all the polygons are valid or not. From the results, we can see a total of 9 not valid.
# checks for the number of geometries that are invalid
length(which(st_is_valid(mpsz_sf) == FALSE))[1] 9
To rectify this, we can use st_make_valid() to correct these invalid geometries as demonstrated in the code chunk below.
mpsz_sf <- st_make_valid(mpsz_sf)
length(which(st_is_valid(mpsz_sf) == FALSE))[1] 0
Here is a plot of Singapore.
plot(mpsz_sf)2.1.2 Importing Kindergartens
Click to view the code
kindergarten_json <- fromJSON("data/geospatial/kindergartens.json")
kindergarten_cleaned <- kindergarten_json$SrchResults[-1, ]
kindergarten_df <- data.frame(
NAME = kindergarten_cleaned$NAME,
latitude = sapply(kindergarten_cleaned$LatLng, function(x) as.numeric(unlist(strsplit(x, ","))[1])),
longitude = sapply(kindergarten_cleaned$LatLng, function(x) as.numeric(unlist(strsplit(x, ","))[2]))
)
kindergarten_sf <- kindergarten_df %>%
st_as_sf(coords = c("longitude", "latitude"), crs=4326) %>%
st_transform(crs = 3414)head(kindergarten_sf)Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 19307.62 ymin: 38859.25 xmax: 30705.05 ymax: 46300.26
Projected CRS: SVY21 / Singapore TM
NAME
1 PCF Sparkletots Preschool @ Cheng San-Seletar Blk 435 (KN)
2 PCF Sparkletots Preschool @ Cheng San-Seletar Blk 533 (KN)
3 PCF Sparkletots Preschool @ Cheng San-Seletar Blk 556 (DS)
4 PCF Sparkletots Preschool @ Chong Pang Blk 107 (KN)
5 PCF Sparkletots Preschool @ Chong Pang Blk 122 (KN)
6 PCF Sparkletots Preschool @ Chua Chu Kang Blk 10 (KN)
geometry
1 POINT (30325.45 38859.25)
2 POINT (30190.51 39574.18)
3 POINT (30705.05 39337.9)
4 POINT (27354.73 45992.92)
5 POINT (27755.87 46300.26)
6 POINT (19307.62 40271.08)
2.1.3 Importing Childcare
Click to view the code
childcare_json <- fromJSON("data/geospatial/childcare.json")
childcare_cleaned <- childcare_json$SrchResults[-1, ]
childcare_df <- data.frame(
NAME = childcare_cleaned$NAME,
latitude = sapply(childcare_cleaned$LatLng, function(x) as.numeric(unlist(strsplit(x, ","))[1])),
longitude = sapply(childcare_cleaned$LatLng, function(x) as.numeric(unlist(strsplit(x, ","))[2]))
)
childcare_sf <- childcare_df %>%
st_as_sf(coords = c("longitude", "latitude"), crs=4326) %>%
st_transform(crs = 3414)head(childcare_sf)Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 17828.84 ymin: 29221.89 xmax: 40985.94 ymax: 45530.47
Projected CRS: SVY21 / Singapore TM
NAME geometry
1 APOLLO INTERNATIONAL PRESCHOOL PRIVATE LIMITED POINT (40985.94 33848.38)
2 APPLE TREE PLAYHOUSE PTE LTD POINT (28308.65 45530.47)
3 Appleland Montessori Child Care Centre Pte Ltd POINT (17828.84 36607.36)
4 APPLELAND PLAYHOUSE POINT (25579.73 29221.89)
5 APRICOT ACADEMY (LAGUNA) PTE. LTD. POINT (38981.02 32483.41)
6 Arise Preschool POINT (21588.47 36307)
2.1.4 Importing Hawker Centre
Similarly here, we will usest_read to read the geojson information, however since the columns values are in the format of `<th>
` etc we will need to use a regex which is what we have done below to extract the name of the hawker centres.
Click to view the code
hawker_sf <- st_read('data/geospatial/HawkerCentresGEOJSON.geojson')Reading layer `HawkerCentresGEOJSON' from data source
`/Users/georgiaxng/georgiaxng/is415-handson/Take-home_Ex/Take-home_Ex03/data/geospatial/HawkerCentresGEOJSON.geojson'
using driver `GeoJSON'
Simple feature collection with 125 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6974 ymin: 1.272716 xmax: 103.9882 ymax: 1.449017
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Click to view the code
# Function to extract name from description
extract_name <- function(description) {
if (!is.na(description)) {
# Use regular expression to extract the NAME
name <- sub(".*<th>NAME</th> <td>(.*?)</td>.*", "\\1", description)
if (name == description) {
return(NA) # Return NA if no match is found
}
return(name)
} else {
return(NA)
}
}
# Apply the extraction function to every row
hawker_sf <- hawker_sf %>%
mutate(Name = sapply(Description, extract_name)) %>% select (-Description)head(hawker_sf)Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.7798 ymin: 1.284425 xmax: 103.9048 ymax: 1.449017
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Name geometry
1 Market Street Hawker Centre POINT Z (103.8502 1.284425 0)
2 Marsiling Mall Hawker Centre POINT Z (103.7798 1.433539 0)
3 Margaret Drive Hawker Centre POINT Z (103.8047 1.297486 0)
4 Fernvale Hawker Centre & Market POINT Z (103.8771 1.391592 0)
5 One Punggol Hawker Centre POINT Z (103.9048 1.40819 0)
6 Bukit Canberra Hawker Centre POINT Z (103.8225 1.449017 0)
As shown above, we can see that the geographic coordinate system for the hawker dataset is in WGS84 and has XYZ coordinates, among which contains the Z-coordinates we do not need. Thus, we can use st_zm() to remove the Z-coordinate and project it to the SVY21 coordiate system using st_transform().
hawker_sf <- st_zm(hawker_sf) %>%
st_transform(crs = 3414)
head(hawker_sf)Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 22042.51 ymin: 29650.7 xmax: 35955.52 ymax: 47850.43
Projected CRS: SVY21 / Singapore TM
Name geometry
1 Market Street Hawker Centre POINT (29874.82 29650.7)
2 Marsiling Mall Hawker Centre POINT (22042.51 46139.03)
3 Margaret Drive Hawker Centre POINT (24816.7 31094.91)
4 Fernvale Hawker Centre & Market POINT (32867.9 41500.77)
5 One Punggol Hawker Centre POINT (35955.52 43336.13)
6 Bukit Canberra Hawker Centre POINT (26794.39 47850.43)
Click to view the code
tmap_mode('plot')
tm_shape(mpsz_sf) +
tm_polygons()+
tm_shape(hawker_sf) +
tm_dots(col='red')2.1.5 Importing Bus Stops
busstop_sf <- st_read(dsn = "data/geospatial/BusStopLocation_Jul2024/", layer = "BusStop")%>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`/Users/georgiaxng/georgiaxng/is415-handson/Take-home_Ex/Take-home_Ex03/data/geospatial/BusStopLocation_Jul2024'
using driver `ESRI Shapefile'
Simple feature collection with 5166 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48285.52 ymax: 52983.82
Projected CRS: SVY21
head(busstop_sf)Simple feature collection with 6 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 20134.2 ymin: 31253.63 xmax: 35565.66 ymax: 41659.52
Projected CRS: SVY21 / Singapore TM
BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
1 65059 B12 ST ANNE'S CH POINT (35565.66 41659.52)
2 16171 B06 YUSOF ISHAK HSE POINT (21439.91 31253.63)
3 61101 NIL BLK 120 POINT (31381.06 35313.49)
4 01239 B01 SULTAN PLAZA POINT (31152.55 31688.08)
5 17269 B01 BLK 730 POINT (20134.2 31917.38)
6 11291 B17 COLD STORAGE JELITA POINT (22723.57 33349.85)
2.1.6 Importing Shopping Malls
shoppingmall_sf <- read_csv('data/geospatial/shopping_mall_coordinates.csv') %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
st_transform(crs = 3414)head(shoppingmall_sf)Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 28561.14 ymin: 28563.01 xmax: 31449.41 ymax: 34203.18
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 2
`Mall Name` geometry
<chr> <POINT [m]>
1 100 AM (29129.86 28563.01)
2 313@Somerset (28561.14 31485.08)
3 Aperia (31449.41 32531.06)
4 Balestier Hill Shopping Centre (29029.77 34203.18)
5 Bugis Cube (30483.66 31167.35)
6 Bugis Junction (30458.71 31274.83)
2.1.7 Importing MRT
mrt_sf <- st_read(dsn = "data/geospatial/TrainStation_Jul2024/", layer = "RapidTransitSystemStation")Reading layer `RapidTransitSystemStation' from data source
`/Users/georgiaxng/georgiaxng/is415-handson/Take-home_Ex/Take-home_Ex03/data/geospatial/TrainStation_Jul2024'
using driver `ESRI Shapefile'
Simple feature collection with 230 features and 5 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 6068.209 ymin: 27478.44 xmax: 45377.5 ymax: 47913.58
Projected CRS: SVY21
Having imported the dataset, we will now need to check for both invalid geometries and NA values before proceeding. The chunk of code detects not only these but also resolves it.
# Check for invalid geometries and NA values
validity_checks <- st_is_valid(mrt_sf, reason = TRUE)
# Identify indices with NA
na_indices <- which(is.na(validity_checks))
# Filter out rows with NA values from the mrt object
mrt_sf <- mrt_sf[-na_indices, ]
# Verify the mrt object no longer contains invalid geometries
any(is.na(sf::st_is_valid(mrt_sf)))[1] FALSE
Here we use st_transform() to convert it to the SVY21 Coordinates System of CRS code 3414.
mrt_sf <- mrt_sf %>%
st_transform(crs = 3414)tmap_mode('plot')
tm_shape(mpsz_sf)+
tm_polygons() +
tm_shape(mrt_sf) +
tm_dots(col='red')2.1.8 Importing Primary School
This chunk of code imports the primary school dataset from data.gov.sg and uses the select() function to select the relevant columns through the input of the column numbers.
primarysch_df = read_csv('data/geospatial/Generalinformationofschools.csv') %>% filter(mainlevel_code =='PRIMARY') %>% select(1,3,4)2.1.8.1 Geocoding Primary School Data using OneMap API
Click to view the code
geocode <- function(address, postal) {
base_url <- "https://www.onemap.gov.sg/api/common/elastic/search"
query <- list("searchVal" = address,
"postal" = postal,
"returnGeom" = "Y",
"getAddrDetails" = "N",
"pageNum" = "1")
res <- GET(base_url, query = query)
restext<-content(res, as="text")
output <- fromJSON(restext) %>%
as.data.frame %>%
select(results.LATITUDE, results.LONGITUDE)
return(output)
}Click to view the code
primarysch_df$LATITUDE <- 0
primarysch_df$LONGITUDE <- 0
for (i in 1:nrow(primarysch_df)){
temp_output <- geocode(primarysch_df[i, 2], primarysch_df[i, 3])
print(i)
primarysch_df$LATITUDE[i] <- temp_output$results.LATITUDE
primarysch_df$LONGITUDE[i] <- temp_output$results.LONGITUDE
}
write_rds(primarysch_df, 'data/rds/geocoded_primarysch.rds')As shown below, using head() we can see that the new columns for lat and long has been added with the values fetched using the OneMap API.
head(primarysch_df)# A tibble: 6 × 3
school_name address postal_code
<chr> <chr> <dbl>
1 ADMIRALTY PRIMARY SCHOOL 11 WOODLANDS CIRCLE 738907
2 AHMAD IBRAHIM PRIMARY SCHOOL 10 YISHUN STREET 11 768643
3 AI TONG SCHOOL 100 Bright Hill Drive 579646
4 ALEXANDRA PRIMARY SCHOOL 2A Prince Charles Crescent 159016
5 ANCHOR GREEN PRIMARY SCHOOL 31 Anchorvale Drive 544969
6 ANDERSON PRIMARY SCHOOL 19 ANG MO KIO AVE 9 569785
Using read_rds, we can access the already processed and geocoded data from rds without needing to run through the geocoding function again. Since the data is in the WGS coordinate system, we can use st_transform() to project it to the SVY21 coordinate system we will be using.
primarysch_df <- read_rds('data/rds/geocoded_primarysch.rds')
primarysch_sf <- primarysch_df %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
st_transform(crs = 3414)Click to view the code
tmap_mode('plot')
tm_shape(mpsz_sf)+
tm_polygons() +
tm_shape(primarysch_sf) +
tm_dots(col='red')2.1.9 Inferring CBD
Finally, let us factor in the proximity to the Central Business District - in the Downtown Core. For this, let us take the coordinates of Downtown Core to be the coordinates of the CBD:
lat <- 1.287953
lng <- 103.851784
cbd_sf <- data.frame(lat, lng) %>%
st_as_sf(coords = c("lng", "lat"), crs=4326) %>%
st_transform(crs=3414)2.2 Importing Aspatial Data
2.2.1 Importing Rental Flat
The code chunk below is used to import the rental
rental_df = read_csv('data/aspatial/RentingOutofFlats2024CSV.csv')To get a brief overview of existing columns of this dataset, we can use colnames() to do so.
colnames(rental_df)[1] "rent_approval_date" "town" "block"
[4] "street_name" "flat_type" "monthly_rent"
2.2.1.1 Converting rent_approval_date to a Valid Date Format
rental_df$rent_approval_date <- ym(rental_df$rent_approval_date)2.2.1.2 Filtering For Selected Time Frame
rental_df <- rental_df %>%
filter(year(rent_approval_date) == 2024)2.2.1.3 Geocoding Rental Flat Data Using OneMap API
geocode <- function(block, streetname) {
base_url <- "https://www.onemap.gov.sg/api/common/elastic/search"
address <- paste(block, streetname, sep = " ")
query <- list("searchVal" = address,
"returnGeom" = "Y",
"getAddrDetails" = "N",
"pageNum" = "1")
res <- GET(base_url, query = query)
restext<-content(res, as="text")
output <- fromJSON(restext) %>%
as.data.frame %>%
select(results.LATITUDE, results.LONGITUDE)
return(output)
}rental_df$LATITUDE <- 0
rental_df$LONGITUDE <- 0
for (i in 1:nrow(rental_df)){
temp_output <- geocode(rental_df[i, 3], rental_df[i, 4])
print(i)
rental_df$LATITUDE[i] <- temp_output$results.LATITUDE
rental_df$LONGITUDE[i] <- temp_output$results.LONGITUDE
}
write_rds(rental_df, 'data/rds/geocoded_rental_2024.rds')rental_df <- read_rds('data/rds/geocoded_rental_2024.rds')2.2.1.4 CRS Adjustments
Another important step after importing the dataset is checking the coordinate system used, as seen in the result below using st_crs(), we can see that there is no CRS.
st_crs(rental_df)Coordinate Reference System: NA
Therefore, we need to convert the longitude and latitude columns into a spatial format. Since our dataset is based in Singapore and it uses the SVY21 coordinate reference system (CRS Code: 3414), we will use the st_transform() function to perform the conversion and create the geometry column.
rental_sf <- rental_df %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
st_transform(crs = 3414)Using st_crs(), we can check and verify that the conversion is successful.
st_crs(rental_sf)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
head(rental_sf)Simple feature collection with 6 features and 6 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 15786.61 ymin: 30769.77 xmax: 39668.39 ymax: 45634.94
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 7
rent_approval_date town block street_name flat_type monthly_rent
<date> <chr> <chr> <chr> <chr> <dbl>
1 2024-01-01 YISHUN 386 YISHUN RING RD 4-ROOM 3700
2 2024-01-01 JURONG WEST 140B CORPORATION DR 4-ROOM 3900
3 2024-01-01 SENGKANG 471B FERNVALE ST 5-ROOM 3700
4 2024-01-01 KALLANG/WHAMPOA 10 GLOUCESTER RD 3-ROOM 3600
5 2024-01-01 BEDOK 31 BEDOK STH AVE… 5-ROOM 4350
6 2024-01-01 QUEENSTOWN 82 STRATHMORE AVE 4-ROOM 3000
# ℹ 1 more variable: geometry <POINT [m]>
tmap_mode('plot')
tm_shape(mpsz_sf) +
tm_polygons()+
tm_shape(rental_sf) +
tm_dots()2.2.1.5 Checking for NA values
This chunk of code checks the dataset for any na values in all of the columns. As shown below, there is none.
rental_sf %>%
summarise(across(everything(), ~ sum(is.na(.)))) -> extra_NA
extra_NASimple feature collection with 1 feature and 6 fields
Geometry type: MULTIPOINT
Dimension: XY
Bounding box: xmin: 11519.15 ymin: 28097.64 xmax: 45192.3 ymax: 48741.06
Projected CRS: SVY21 / Singapore TM
# A tibble: 1 × 7
rent_approval_date town block street_name flat_type monthly_rent
<int> <int> <int> <int> <int> <int>
1 0 0 0 0 0 0
# ℹ 1 more variable: geometry <MULTIPOINT [m]>
3. Data Wrangling
3.1 Removal of Redundant Columns
# Define columns to be removed
columns_to_remove <- c("block","street_name")
# Remove columns only if they exist in the dataframe
rental_sf <- rental_sf %>%
dplyr::select(-all_of(columns_to_remove[columns_to_remove %in% names(rental_sf)]))3.2 Filter By Flat Type
Let us get an overview of the distributions of the housing types. As shown in the histogram, we can see that there is significantly less data for flat types like 1-room, 2-room, and executive housing.
# Create a summary of counts for each remaining lease range
count_data <- rental_sf %>%
group_by(flat_type) %>%
summarise(count = n())
# Create the bar plot with labels
ggplot(count_data, aes(x = flat_type, y = count)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_text(aes(label = count), vjust = -0.5, size = 4) + # Add labels on top of the bars
labs(title = "Count of Flat Type",
x = "Flat Type",
y = "Count") +
theme_minimal()Hence, we will focus on analyzing the 3-room, 4-room, and 5-room flats since they show a more substantial presence in the dataset compared to smaller flat types.
rental_sf <- rental_sf %>% filter (flat_type == '3-ROOM' | flat_type == '4-ROOM' |flat_type == '5-ROOM' )3.3 Calculate Number of Amenities Within 1km & Proximity To Nearest Amenity
Click to view the code
calculate_amenities_and_proximity <- function(dataset1, dataset2, name_of_col_amenities, name_of_col_proximity, radius, calculateNumberOfAmenities) {
# Calculate distance matrix
dist_matrix <- st_distance(dataset1, dataset2) %>%
drop_units()
if (calculateNumberOfAmenities){
# Calculate the number of amenities within the specified radius
dataset1[[name_of_col_amenities]] <- rowSums(dist_matrix <= radius)
}
# Calculate the proximity to the nearest amenity
dataset1[[name_of_col_proximity]] <- rowMins(dist_matrix)
return(dataset1)
}Click to view the code
rental_sf <-
calculate_amenities_and_proximity(
rental_sf, kindergarten_sf, "no_of_kindergarten_500m", "prox_kindergarten", 500, TRUE
) %>%
calculate_amenities_and_proximity(
., childcare_sf, "no_of_childcare_500m", "prox_childcare", 500, TRUE
) %>%
calculate_amenities_and_proximity(
., hawker_sf, "no_of_hawker_500m", "prox_hawker", 500, TRUE
) %>%
calculate_amenities_and_proximity(
., busstop_sf, "no_of_busstop_500m", "prox_busstop", 500, TRUE
) %>%
calculate_amenities_and_proximity(
., shoppingmall_sf, "no_of_shoppingmall_1km", "prox_shoppingmall", 1000, TRUE
) %>%
calculate_amenities_and_proximity(
., mrt_sf, "x", "prox_mrt", 1000, FALSE
) %>%
calculate_amenities_and_proximity(
., primarysch_sf, "x", "prox_prisch", 1000, FALSE
) %>%
calculate_amenities_and_proximity(
., cbd_sf, "x", "prox_cbd", 1000, FALSE
)
# Writing to RDS
write_rds(rental_sf,'data/rds/rental_sf.rds')rental_sf <- read_rds('data/rds/rental_sf.rds')
head(rental_sf)Simple feature collection with 6 features and 17 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 15786.61 ymin: 30769.77 xmax: 39668.39 ymax: 45634.94
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 18
rent_approval_date town flat_type monthly_rent geometry
<date> <chr> <chr> <dbl> <POINT [m]>
1 2024-01-01 YISHUN 4-ROOM 3700 (29463.95 45634.94)
2 2024-01-01 JURONG WE… 4-ROOM 3900 (15786.61 34389.18)
3 2024-01-01 SENGKANG 5-ROOM 3700 (33244.8 42043.31)
4 2024-01-01 KALLANG/W… 3-ROOM 3600 (30102.41 32911.75)
5 2024-01-01 BEDOK 5-ROOM 4350 (39668.39 33918.01)
6 2024-01-01 QUEENSTOWN 4-ROOM 3000 (25265.85 30769.77)
# ℹ 13 more variables: no_of_kindergarten_500m <dbl>, prox_kindergarten <dbl>,
# no_of_childcare_500m <dbl>, prox_childcare <dbl>, no_of_hawker_500m <dbl>,
# prox_hawker <dbl>, no_of_busstop_500m <dbl>, prox_busstop <dbl>,
# no_of_shoppingmall_1km <dbl>, prox_shoppingmall <dbl>, prox_mrt <dbl>,
# prox_prisch <dbl>, prox_cbd <dbl>
4. Overview Of Dataset
colnames(rental_sf) [1] "rent_approval_date" "town"
[3] "flat_type" "monthly_rent"
[5] "geometry" "no_of_kindergarten_500m"
[7] "prox_kindergarten" "no_of_childcare_500m"
[9] "prox_childcare" "no_of_hawker_500m"
[11] "prox_hawker" "no_of_busstop_500m"
[13] "prox_busstop" "no_of_shoppingmall_1km"
[15] "prox_shoppingmall" "prox_mrt"
[17] "prox_prisch" "prox_cbd"
Explanatory Variables:
Continuous
Distance to transport:
distance_to_mrt_metersDistance to amenities:
distance_to_pri_school_metersDistance to central business district:
distance_to_cbdFlat Type:
flat_typeProx_to:
Categorical
Remaining Lease:
remaining_lease_rangeStorey Height:
storey_range
Dependent Variables:
- Monthly Rental:
monthly_rental,price_per_sqft
3. Shiny Storyboard (EDA)
3.1 Distribution
3.2
4. Distribution
4.1 Categorical Variables
4.1.1 Housing Type
# Create a summary of counts for each remaining lease range
count_data <- rental_sf %>%
group_by(flat_type) %>%
summarise(count = n())
# Create the bar plot with labels
ggplot(count_data, aes(x = flat_type, y = count)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_text(aes(label = count), vjust = -0.5, size = 4) + # Add labels on top of the bars
labs(title = "Count of Flat Type",
x = "Flat Type",
y = "Count") +
theme_minimal()Bivariate Analysis
Correlation Matrix
Drafts
mpsz_sf_main <- st_union(mpsz_sf) %>%
st_cast("POLYGON")
mpsz_sf_main <- mpsz_sf_main[c(12)]
mpsz_sf_owin <- as.owin(mpsz_sf_main)plot(mpsz_sf_owin)tmap_mode('plot')
tm_shape(mpsz_sf%>% filter(PLN_AREA_N == 'ANG MO KIO'))+
tm_polygons()+
tm_shape(rental_sf %>% filter(planning_area_ura == 'ANG MO KIO'))+
tm_dots()